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Abstract 

The canonical problem of solving a system of linear equations arises in numerous contexts in 
information theory, communication theory, and related fields. In this contribution, we develop a so- 
lution based upon Gaussian belief propagation (GaBP) that does not involve direct matrix inversion. 
The iterative nature of our approach allows for a distributed message-passing implementation of the 
solution algorithm. We address the properties of the GaBP solver, including convergence, exactness, 
computational complexity, message-passing efficiency and its relation to classical solution methods. 
We use numerical examples and applications, like linear detection, to illustrate these properties 
through the use of computer simulations. This empirical study demonstrates the attractiveness (e.g. , 
faster convergence rate) of the proposed GaBP solver in comparison to conventional linear-algebraic 
iterative solution methods. 
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I. Introduction 

Solving a system of linear equations Ax = b is one of the most fundamental problems in 
algebra, with countless applications in the mathematical sciences and engineering. Given an 
observation vector b G and the data matrix A G R"^^" (m > n G Z), a unique solution, 
X = X* G M", exists if and only if the data matrix A has full column rank. Assuming a 
nonsingular matrix A, the system of equations can be solved either directly or in an iterative 
manner. Direct matrix inversion methods, such as Gaussian elimination (LU factorization, [1]- 
Ch. 3) or band Cholesky factorization ( [1]-Ch. 4), find the solution with a finite number of 
operations, typically, for a dense nx n matrix, of the order of n^. The former is particularly 
effective for systems with unstructured dense data matrices, while the latter is typically used 
for structured dense systems. 

Iterative methods [2] are inherently simpler, requiring only additions and multiplications, 
and have the further advantage that they can exploit the sparsity of the matrix A to reduce the 
computational complexity as well as the algorithmic storage requirements [3]. By comparison, 
for large, sparse and amorphous data matrices, the direct methods are impractical due to the 
need for excessive matrix reordering operations. 

The main drawback of the iterative approaches is that, under certain conditions, they 
converge only asymptotically to the exact solution x* [2]. Thus, there is the risk that they 
may converge slowly, or not at all. In practice, however, it has been found that they often 
converge to the exact solution or a good approximation after a relatively small number of 
iterations. 

A powerful and efficient iterative algorithm, belief propagation (BP, [4]), also known 
as the sum-product algorithm, has been very successfully used to solve, either exactly or 
approximately, inference problems in probabilistic graphical models [5]. 

In this paper, we reformulate the general problem of solving a linear system of algebraic 

equations as a probabilistic inference problem on a suitably-defined graph. We believe that 

this is the first time that an explicit connection between these two ubiquitous problems has 
fl 

been establisheqj. Furthermore, for the first time, we provide a full step-by-step derivation 
of the GaBP algorithm from the belief propagation algorithm. 

'Recently, we have found out the work of Moallemi and Van Roy [6] which discusses the connection between the Min- 
Sum message passing algorithm and solving quadratic programs. Both works [6], [7] where published in parallel, and the 
algorithms where derived independently, using different techniques. In Appendix IIVI we discuss the connection between the 
two algorithms, and show they are equivalent. 



As an important consequence, we demonstrate that Gaussian BP (GaBP) provides an effi- 
cient, distributed approach to solving a linear system that circumvents the potentially complex 
operation of direct matrix inversion. Using the seminal work of Weiss and Freeman [8] and 
some recent related developments [6], [7], [9]-[12], we address the convergence and exactness 
properties of the proposed GaBP solver. Other properties of the GaBP solver, as computational 
complexity, message-passing efficiency and its relation to classical solution methods are also 
investigated. 

As application of this new approach to solving a system of linear equations, we con- 
sider the problem of linear detection using a decorrelator in a code-division multiple-access 
(CDMA) system. Through the use of the iterative message-passing formulation, we implement 
the decorrelator detector in a distributed manner. This example allows us to quantitatively 
compare the new GaBP solver with the classical iterative solution methods that have been 
previously investigated in the context of a linear implementation of CDMA demodula- 
tion [13]-[15]. We show that the GaBP-based decorrelator yields faster convergence than 
these conventional methods. Furthermore, the GaBP convergence is further accelerated by 
incorporating the linear- algebraic methods of Aitken and Steffensen [16] into the GaBP-based 
scheme. As far as we know, this represents the first time these acceleration methods have 
been examined within the framework of message-passing algorithms. 

The paper is organized as follows. We introduce the problem model in Section |Ill In 
Section Unl we derive the distributed GaBP-based solution method and address its convergence 
and exactness properties in Section |lVl We discuss the algorithm complexity and message 
passing efficiency in Section |Vl The relation to classical linear algebra solution methods 
is explored in Section |VIl In Section IVIIIl we outline numerical examples and use the 
linear detection problem to illustrate experimentally the superior convergence rate of the 
GaBP solver, relative to conventional iterative methods. Concluding remarks are presented 
in Section |Xl 

II. Preliminaries: Notations and Definitions 

A. Linear Algebra 

We shall use the following linear- algebraic notations and definitions. The operator {-j^ 
stands for a vector or matrix transpose, the matrix I„ is an n x n identity matrix, while the 
symbols {■}i and {■}ij denote entries of a vector and matrix, respectively. Let M G M"^" be 
a real symmetric square matrix and A G M™^" be a real (possibly rectangular) matrix. Let 



Definition 1 (Pseudoinverse): The Moore-Penrose pseudoinverse matrix of the matrix A, 
denoted by is defined as 

At 4 (A^A)"'a^. (1) 
Definition 2 (Spectral radius): The spectral radius of the matrix M, denoted by p(M), is 
defined to be the maximum of the absolute values of the eigenvalues of M, i.e. , 

p(M) 4 max(|A,|), (2) 

l<i<s 

where Ai, . . . are the eigenvalues of the matrix M. 
Definition 3 (Diagonal dominance): The matrix M is 



1) weakly diagonally dominant if 

\Mii\>J2\MiJ\,y^, (3) 

2) strictly diagonally dominant if 

|M,,|>^|M,,|,Vz, (4) 

3) irreducibly diagonally dominant if M is irreducible!, and 

> ^|iV%|,Vz, (5) 

with strict inequality for at least one i. 
Definition 4 (PSD): The matrix M is positive semi-definite (PSD) if and only if for all 
non-zero real vectors z G M", 

z^Mz > 0. (6) 
Definition 5 (Residual): For a real vector x G M", the residual, r = r(x) G M™, of a linear 
system is r = Ax — b. 

The standard norm of the residual, | |r| \p(p = 1,2, ... , oo), is a good measure of the accuracy 
of a vector x as a solution to the linear system. In our experimental study, the Frobenius 
norm (i.e. , p = 2) per equation is used, ||r||i7/m = ^/Xli^i 

Definition 6: The condition number, k, of the matrix M is defined as 

«:p^||M||p||M||pi. (7) 



^ A matrix is said to be reducible if there is a permutation matrix P such that PMP"'^ is block upper triangular. Otherwise, 
it is irreducible. 



(8) 



For M being a normal matrix (i.e. , M^M = MM^), the condition number is given by 

-^max 

K = K2= , 

^min 

where A^ax and Amin are the maximal and minimal eigenvalues of M, respectively. 

Even though a system is nonsingular it could be ill-conditioned. Ill-conditioning means 
that a small perturbation in the data matrix A, or the observation vector b, causes large 
perturbations in the solution, x*. This determines the difficulty of solving the problem. The 
condition number is a good measure of the ill-conditioning of the matrix. The better the 
conditioning of a matrix the condition number is smaller, going to unity. The condition 
number of a non-invertible (singular) matrix is set arbitrarily to infinity. 

B. Graphical Models 

We will make use of the following terminology and notation in the discussion of the 
GaBP algorithm. Given the data matrix A and the observation vector b, one can write 
explicitly the Gaussian density function, p(x), and its corresponding graph Q consisting of 
edge potentials (compatibility functions) ipij and self potentials ('evidence') (pi. These graph 
potentials are simply determined according to the following pairwise factorization of the 
Gaussian function ([TT]) 

n 

Xi , Xj 

), (9) 

resulting in ■ipij{xi,Xj) = exp^—XiAijXj) and (pi{xi) = exp (biXi — Aiixf/2). The edges set 
{i,j} includes all non-zero entries of A for which i > j. The set of graph nodes N{i) denotes 
the set of all the nodes neighboring the ith node (excluding node i). The set excludes 
the node j from N(z). 

C. Problem Formulation 

Let A G R'"^" (m, n E W) be a full column rank, mx n real-valued matrix, with m > n, 
and let b G be a real-valued vector. Our objective is to efficiently find a solution x* to 
the linear system of equations Ax = b given by 

X* = A^b. (10) 

Throughout the development in this contribution, we make the following assumption. 
Assumption 7: The matrix A is square (i.e. , m = n) and symmetric. 





For the case of square matrices the pseudoinverse matrix is nothing but the data matrix inverse, 
i.e., = A^^ For any linear system of equations with a unique solution. Assumption |7] 
conceptually entails no loss of generality, as can be seen by considering the invertible 
system defined by the new symmetric (and PSD) matrix A'^ nxm-^mxn ^ -^nxn and vector 
A^nxmbmxi ^ ^nxi- Howcvcr, this transformation involves an excessive computational 
complexity of 0{n^m) and 0{nm) operations, respectively. Furthermore, a sparse data matrix 
may become dense due to the transformation, an undesired property as far as complexity 
concerns. Thus, we first limit the discussion to the solution of the popular case of square 
matrices. In Section IVIIl the proposed GaBP solver is extended to the more general case of 
linear systems with rectangular m x n full rank matrices. 

III. The GaBP-Based Solver Derivation 
A. From Linear Algebra to Probabilistic Inference 

We begin our derivation by defining an undirected graphical model (i.e. , a Markov random 
field), Q, corresponding to the linear system of equations. Specifically, let Q = (X, £), where 
X is a set of nodes that are in one-to-one correspondence with the linear system's variables 
X = {xi, . . . ,Xn}^, and where £ is a set of undirected edges determined by the non-zero 
entries of the (symmetric) matrix A. 

Using this graph, we can translate the problem of solving the linear system from the 
algebraic domain to the domain of probabilistic inference, as stated in the following theorem. 

Proposition 8: The computation of the solution vector x* is identical to the inference of 
the vector of marginal means /i = {/ii, . . . over the graph Q with the associated joint 
Gaussian probability density function p{-x.) ~ A/'(/i, A"^). 

Proof: [Proof] Another way of solving the set of linear equations Ax — b = is to 
represent it by using a quadratic form g(x) = x^Ax/2 — b^x. As the matrix A is symmetric, 
the derivative of the quadratic form w.r.t. the vector x is given by the vector dq/dx = Ax— b. 
Thus equating dq/dx = gives the global minimum x* of this convex function, which is 
nothing but the desired solution to Ax = b. 

Next, one can define the following joint Gaussian probability density function 

p(x) = Z-^ exp ( - g(x)) = Z'^ exp (-x^Ax/2 + b^x), (1 1) 

where Z is a distribution normalization factor. Denoting the vector /i = A^^b, the Gaussian 



density function can be rewritten as 



p(x) 



X 



exp (-x'^Ax/2 + //^Ax - /i'^A/i/2) 
C-^exp(-(x-/if A(x-/i)/2) 



Ar(/i, A 



1 



(12) 



where the new normalization factor C = Zexp (— /i^Ayu/2). It follows that the target solution 
X* = A^^b is equal to /i = A^^b, the mean vector of the distribution p(x), as defined 
above (fTTl) . 

Hence, in order to solve the system of linear equations we need to infer the marginal 
densities, which must also be Gaussian, p{xi) ^ Af{fii = {A^^bjj, = {A^^ju), where 
Hi and Pi are the marginal mean and inverse variance (sometimes called the precision), 
respectively. ■ 

According to Proposition [8l solving a deterministic vector-matrix linear equation translates 
to solving an inference problem in the corresponding graph. The move to the probabilistic 
domain calls for the utilization of BP as an efficient inference engine. 

Remark 9: Defining a jointly Gaussian probability density function, immediately yields an 
implicit assumption on the positive semi-definiteness of the precision matrix A, in addition 
to the symmetry assumption. However, we would like to stress out that this assumption 
emerges only for exposition purposes, so we can use the notion of 'Gaussian probability', 
but the derivation of the GaBP solver itself does not use this assumption. See the numerical 
example of the exact GaBP-based solution of a system with a symmetric, but not positive 
semi-definite, data matrix A in Section IVIII-CI 

B. Belief Propagation 

Belief propagation (BP) is equivalent to applying Pearl's local message-passing algo- 
rithm [4], originally derived for exact inference in trees, to a general graph even if it contains 
cycles (loops). BP has been found to have outstanding empirical success in many applications, 
e.g. , in decoding Turbo codes and low-density parity-check (LDPC) codes. The excellent 
performance of BP in these applications may be attributed to the sparsity of the graphs, 
which ensures that cycles in the graph are long, and inference may be performed as if it 
were a tree. 



s 



The BP algorithm functions by passing real-valued messages across edges in the graph and 
consists of two computational rules, namely the 'sum-product rule' and the 'product rule'. 
In contrast to typical applications of BP in coding theory [17], our graphical representation 
resembles to a pairwise Markov random field [5] with a single type of propagating messages, 
rather than a factor graph [18] with two different types of messages, originated from either 
the variable node or the factor node. Furthermore, in most graphical model representations 
used in the information theory literature the graph nodes are assigned with discrete values, 
while in this contribution we deal with nodes corresponding to continuous variables. Thus, 
for a graph Q composed of potentials tpij and (pi as previously defined, the conventional 
sum-product rule becomes an integral-product rule [8] and the message mij{xj), sent from 
node i to node j over their shared edge on the graph, is given by 



where the scalar a is a normalization constant. Note that the propagating messages (and the 
graph potentials) do not have to describe valid (i.e. , normalized) density probability functions, 
as long as the inferred marginals do. 

C. The Gaussian BP Algorithm 

Gaussian BP is a special case of continuous BP, where the underlying distribution is 
Gaussian. Now, we derive the Gaussian BP update rules by substituting Gaussian distributions 
into the continuous BP update equations. 

Given the data matrix A and the observation vector b, one can write explicitly the Gaussian 
density function, p(x) (fT2l) . and its corresponding graph Q. Using the graph definition and 
a certain (arbitrary) pairwise factorization of the Gaussian function (fT2l) . the edge potentials 
(compatibility functions) and self potentials ('evidence') (pi are determined to be 




(13) 



p{xi) = a(pi{xi) Yl mki{xi), 

fceN(i) 



(14) 



V^-iji' (•^i? 6Xp(^ X^-A^ji'Xj), 

(pi{xi) = exp [biXi - AiiX^j2), 



(15) 



(16) 



respectively. Note that by completing the square, one can observe that 



(pi{xi) (X Afifiii = hi/ An, p.. 



1-1 




(17) 



The graph topology is specified by the structure of the matrix A, i.e. the edges set {i,j} 
includes all non-zero entries of A for which i > j. 

Before describing the inference algorithm performed over the graphical model, we make 
the elementary but very useful observation that the product of Gaussian densities over a 
common variable is, up to a constant factor, also a Gaussian density. 

Lemma 10: Let fi{x) and f2{x) be the probability density functions of a Gaussian random 
variable with two possible densities A/'(/ii,Pf^) and A/'(/i2, -^2^), respectively. Then their 
product, f(x) = fi{x)f2{x) is, up to a constant factor, the probability density function of a 
Gaussian random variable with distribution J\f{fi, P^^), where 

/i = p-l(Pi/ii+P2/i2), (18) 

= (Pi + P2)-^ (19) 
The proof of this lemma is found in Appendix HI 




k e N{i)\J 



Fig. 1. Belief propagation message flow 



Fig. \T\ plots a portion of a certain graph, describing the neighborhood of node i. Each 
node (empty circle) is associated with a variable and self potential </), which is a function 
of this variable, while edges go with the pairwise (symmetric) potentials ^. Messages 
are propagating along the edges on both directions (only the messages relevant for the 
computation of niij are drawn in Fig. [T]). Looking at the right hand side of the integral-product 
rule (fT3l) . node i needs to first calculate the product of all incoming messages, except for the 
message coming from node j. Recall that since p(x) is jointly Gaussian, the factorized self 
potentials (f)i{xi) oc J\f{fiii,P^^) (fTTl) and similarly all messages mki{xi) oc Af{fikh Pki^) are 
of Gaussian form as well. 



As the terms in the product of the incoming messages and the self potential in the integral- 
product rule (fT3]) are all a function of the same variable, Xj (associated with the node i), 
then, according to the multivariate extension of Lemma [TOl 

<f^ii^i) n (20) 

fceN(i)\j 

is proportional to a certain Gaussian distribution, J\f , P^j) . Applying the multivariate 
version of the product precision expression in (fT9]) . the update rule for the inverse variance 
is given by (over-braces denote the origin of each of the terms) 

P^\,=^+ J2 (21) 
keN(i)\j 

where Pa = An is the inverse variance a-priori associated with node i, via the precision of 
(piixi), and Pki are the inverse variances of the messages niki{xi). Similarly using (fTSi) for 
the multivariate case, we can calculate the mean 

~ ( ^iif^ii + Pki^-ki j , (22) 

k&i{i)\j 

where fiu = hi/ An is the mean of the self potential and /i^j are the means of the incoming 
messages. 

Next, we calculate the remaining terms of the message mjj(xj), including the integration 
over Xi. After some algebraic manipulation (deferred to Appendix UT]), using the Gaussian 
integral 

/oo 
exp {—ax^ + hx)dx = a/ vr/aexp (6^ /4a), (23) 
-oo 

we find that the messages mij(xj) are proportional to normal distribution with precision and 
mean 

= -4^' (24) 

f^ij = -Pt^^AjfJ-i\j- (25) 

These two scalars represent the messages propagated in the Gaussian BP-based algorithm. 
Finally, computing the product rule (fT4l) is similar to the calculation of the previous prod- 
uct (|20l) and the resulting mean ([22)) and precision (|2T)) . but including all incoming messages. 
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The marginals are inferred by normalizing the result of this product. Thus, the marginals are 
found to be Gaussian probability density functions A/'(/Uj, P,^^) with precision and mean 

l^i = (y PiifJ'ii + PkifJ'ki ) , (27) 

fcGN(i) 

respectively. The derivation of the GaBP-based solver algorithm is concluded by simply 
substituting the explicit derived expressions of Pi\j (|2T]) into Pij (|24l) . (|22|) and Pij (l24l) 
into fiij ([25]) and Pi\j dZB into fXi (|27|) . 

The message passing in the GaBP solver can be performed subject to any scheduling. 
We refer to two conventional messages updating rules: parallel (flooding or synchronous) 
and serial (sequential, asynchronous) scheduling. In the parallel scheme, messages are stored 
in two data structures: messages from the previous iteration round, and messages from the 
current round. Thus, incoming messages do not affect the result of the computation in the 
current round, since it is done on messages that were received in the previous iteration round. 
Unlike this, in the serial scheme, there is only one data structure, so incoming messages in 
this round, change the result of the computation. In a sense it is exactly like the difference 
between the Jacobi and Guess-Seidel algorithms, to be discussed in the following. Some 
in-depth discussions about parallel vs. serial scheduling in the BP context can be found in 
the work of Elidan et al. [19]. 

D. Max-Product Rule 

A well-known alternative version to the sum-product BP is the max-product (a.k.a. min- 
sum) algorithm [20]. In this variant of BP, maximization operation is performed rather than 
marginalization, i.e. , variables are eliminated by taking maxima instead of sums. For Trellis 
trees (e.g. , graphically representing convolutional codes or ISI channels), the conventional 
sum-product BP algorithm boils down to performing the BCJR algorithm [21], resulting in 
the most probable symbol, while its max-product counterpart is equivalent to the Viterbi 
algorithm [22], thus inferring the most probable sequence of symbols [18]. 

In order to derive the max-product version of the proposed GaBP solver, the integral(sum)- 
product rule ([T3l) is replaced by a new rule 

mij{xj) (X aTgmaxipij{xi,Xj)(j)i{xi) Y\_ mki{x.^dxi. (28) 

feGN(»)\j 



Algorithm 1: 



1 . 


Initialize : 


/ 


Set the neighborhood N(i) to include 








Vfc / ^3Ak^ / 0. 






/ 


Set the scalar fixes 








Pa = An and fiu = bi/An, Vi. 






/ 


Set the initial N(i) B k i scalar messages 








Pki — and fiki — 0. 






/ 


Set a convergence threshold e. 


2 . 


Iterate : 


/ 


Propagate the N(j) 3 k i messages 








Pki and ^fci, Vi (under certain scheduling). 






/ 


Compute the N(j) B i j scalar messages 








Pij = —Aijl [Pii + X]fcgN{i)\j -Pfei)' 








~ {Piil-l-ii + X]ftGN(i)\j Pkil^ki) / Aij . 


3. 


Check: 


/ 


If the messages Pij and /iij did not 








converge (w.r.t. e) , return to Step 2. 






/ 


Else, continue to Step 4 . 


4 . 


In fer: 


/ 


Compute the marginal means 








Mi ~ {PiifJ-ii + X]fcgN(i) PkifJ-ki) / {Pii + X]fcgN(i) -Pfci)' 






(/ 


Optionally compute the marginal precisions 








Pi ~ Pa + X]fcgN{i) ) 


5. 


Sol ve : 


/ 


Find the solution 








X* = fii, Mi. 



Computing mij{xj) according to this max-product rule, one gets (the exact derivation is 
deferred to Appendix Hill) 

mi,{xj) oc Ar(/i,, = -Pr^A,jfx,\j, Pr^ = -Ar^^P^sj), (29) 

which is identical to the messages derived for the sum-product case ((24l) -(l25]). Thus inter- 
estingly, as opposed to ordinary (discrete) BP, the following property of the GaBP solver 
emerges. 

Corollary 11: The max-product (Eq. [28]) and sum-product (Eq. [13]) versions of the GaBP 
solver are identical. 

IV. Convergence and Exactness 

In ordinary BP, convergence does not entail exactness of the inferred probabilities, unless 
the graph has no cycles. Luckily, this is not the case for the GaBP solver. Its underlying 



Gaussian nature yields a direct connection between convergence and exact inference. More- 
over, in contrast to BP the convergence of GaBP is not limited for tree or sparse graphs and 
can occur even for dense (fully-connected) graphs, adhering to certain rules discussed in the 
following. 

We can use results from the literature on probabilistic inference in graphical models [8]- 
[10] to determine the convergence and exactness properties of the GaBP-based solver. The 
following two theorems establish sufficient conditions under which GaBP is guaranteed to 
converge to the exact marginal means. 

Theorem 12: [8, Claim 4] If the matrix A is strictly diagonally dominant, then GaBP 
converges and the marginal means converge to the true means. 

This sufficient condition was recently relaxed to include a wider group of matrices. 

Theorem 13: [9, Proposition 2] If the spectral radius of the matrix A satisfies 

p(|I„-A|)<l, (30) 

then GaBP converges and the marginal means converge to the true means. (The assumption 
here is that the matrix A is first normalized by multiplying with where D = diag{A).) 

A third and weaker sufficient convergence condition (relative to Theorem [T3l) which 
characterizes the convergence of the variances is given in [6, Theorem 2]: For each row in 
the matrix A, if Af^^ > T^j^iAfj then the variances converge. Regarding the means, additional 
condition related to Theorem [13] is given. 

There are many examples of linear systems that violate these conditions, for which GaBP 
converges to the exact means. In particular, if the graph corresponding to the system is acyclic 
(i.e. , a tree), GaBP yields the exact marginal means (and variances [8]), regardless of the 
value of the spectral radius of the matrix [8]. Another example, where the graph is fully- 
connected, is discussed in the following section. However, in contrast to conventional iterative 
methods derived from linear algebra, understanding the conditions for exact convergence and 
quantifying the convergence rate of the GaBP solver remain intriguing open problems. 

A. Convergence Acceleration 

Further speed-up of GaBP can be achieved by adapting known acceleration techniques from 
linear algebra, such Aitken's method and Steffensen's iterations [16]. Consider a sequence 
{xn} (e.g. , obtained by using GaBP iterations) linearly converging to the limit x, and a;„ 7^ x 
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for n > 0. According to Aitken's method, if there exists a real number a such that |a| < 1 

and lim„_>oo(a;n — x)/{xn-i — x) = a, then the sequence {y„} defined by 

Vn Z ~~ 

converges to x faster than in the sense that Um„^oo \ ix — yn)/{x — = 0. Aitken's 
method can be viewed as a generalization of over-relaxation, since one uses values from 
three, rather than two, consecutive iteration rounds. This method can be easily implemented 
in GaBP as every node computes values based only on its own history. 

Steffensen's iterations incorporate Aitken's method. Starting with a;„, two iterations are run 
to get Xn+i and Xn+2- Next, Aitken's method is used to compute this value replaces the 
original x„, and GaBP is executed again to get a new value of Xn+i- This process is repeated 
iteratively until convergence. We remark that, although the convergence rate is improved 
with these enhanced algorithms, the region of convergence of the accelerated GaBP solver 
remains unchanged. 

V. Computational Complexity and Message-Passing Efficiency 

For a dense matrix A each node out of the n nodes sends a unique message to every other 
node on the fully-connected graph. This recipe results in a total of messages per iteration 
round. 

The computational complexity of the GaBP solver as described in Algorithm [T] for a dense 
linear system, in terms of operations (multiplications and additions) per iteration round, is 
shown in Table HI In this case, the total number of required operations per iteration is 0{'n?). 
This number is obtained by evaluating the number of operations required to generate a 
message multiplied by the number of messages. Based on the summation expressions for the 
propagating messages Pij and /ijj, it is easily seen that it takes 0{n) operations to compute 
such a message. In the dense case, the graph is fully-connected resulting in 0{n'^) propagating 
messages. 

In order to estimate the total number of operations required for the GaBP algorithm to 
solve the linear system, we have to evaluate the number of iterations required for convergence. 
It is known [23] that the number of iterations required for an iterative solution method is 
where /(k) is a function of the condition number of the data matrix A. Hence the 
total complexity of the GaBP solver can be expressed by 0{'rr') x 0{f{K)). The analytical 
evaluation of the convergence rate function /(/t) is a challenging open problem. However, it 
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Algorithm 


Operations per msg 


msgs 


Total operations per iteration 


Naive GaBP (Algorithm [ij 


0{n) 






Broadcast GaBP (Algorithmic 


0{n) 


0{n) 





TABLE I 

Computational complexity of the GaBP solver for dense n x n matrix A. 



can be upper bounded by /(k) < k. furthermore, based on our experimental study, described 
in Section IVIIIl we can conclude that /(k) < ^/k. Thus, the total complexity of the GaBP 
solve in this case is 0{'rr') x 0{^/k). For well-conditioned (as opposed to ill-conditioned) 
data matrices the condition number is 0{\). Thus, for well-conditioned linear systems the 
total complexity is 0{n^), i.e. , the complexity is cubic, the same order as for direct solution 
methods, like Gaussian elimination. 

At first sight, this result may be considered disappointing, with no complexity gain w.r.t. 
direct matrix inversion. Luckily, the GaBP implementation as described in Algorithm [T] is a 
naive one, thus termed naive GaBP. In this implementation we did not take into account the 
correlation between the different messages transmitted from a certain node i. These messages, 
computed by summation, distinct from one another in only two summation terms. 

Instead of sending a message composed of the pair of fiij and Pij, a node can broadcast 
the aggregated sums 

Pi = Pn+ Yl ^^'^ (31) 
feGN(i) 

h = P,'^iPiiHii+ ^ PkifJ'ki)- (32) 

fcGN(i) 

Consequently, each node can retrieve locally the Pi\j (|2T1) and (|22l) from the sums by 
means of a subtraction 

Pi\j Pi Pjii (33) 

= P-i- P.yPjifiji. (34) 

The rest of the algorithm remains the same. On dense graphs, the broadcast version sends 
0{n) messages per round, instead of (9(n^) messages in the GaBP algorithm. 
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Algorithm 2: 



1 . 


Initialize : 


/ 


Set the neighborhood N(i) to include 








Vfc / ^3Ak^ + 0. 






/ 


Set the scalar fixes 








Pa = An and [in = hijAn, Vi. 






/ 


Set the initial i N(j) broadcast messages 








Pi = and /ii = 0. 






/ 


Set the initial N(i) B k ^ i internal scalars 








Pki = and Hki — 0. 






/ 


Set a convergence threshold e. 


2 . 


Iterate: 


/ 


Broadcast the aggregated sum messages 








Pi = Pa + X]fegN(i) 








f^i ~ Pi {Piif^ii ~\~ ^^k^N(i) Pkif^ki\ 








(under certain scheduling). 






/ 


Compute the N(j) B i j internal scalars 








Pij = —Aij/{Pi — Pji), 








t^ij ~ {Pifii ~ Pjif^ji) /^ij ■ 


3. 


Check : 


/ 


If the internal scalars Pij and /iij did not 








converge (w.r.t. e) , return to Step 2. 






/ 


Else, continue to Step 4. 


4 . 


In fer : 


/ 


Compute the marginal means 








/^i ~ {Piil-l'ii + X]fc6N(i) Pki fJ-ki) / {Pii + X]fc6N(i) -ffeO ~ Mi' 






(/ 


Optionally compute the marginal precisions 








Pi = Pa + J2keN(i) Pki ~ Pi ) 


5. 


Sol ve : 


/ 


Find the solution 








X* = fli, Vi. 



VI. The GaBP-Based Solver and Classical Solution Methods 

A. Gaussian Elimination 

Proposition 14: The GaBP-based solver (Algorithm [T]) for a system of linear equations 
represented by a tree graph is identical to the renowned Gaussian elimination algorithm (a.k.a. 
LU factorization, [23]). 

Proof: [Proof] Consider a set of n linear equations with n unknown variables, a unique 
solution and a tree graph representation. We aim at computing the unknown variable associ- 
ated with the root node. Without loss of generality as the tree can be drawn with any of the 
other nodes being its root. Let us enumerate the nodes in an ascending order from the root 
to the leaves (see, e.g.. Fig. |2l). 




As in a tree each child node (i.e. , all nodes but the root) has only one parent node and 
based on the top-down ordering, it can be easily observed that the tree graph's corresponding 
data matrix A must have one and only one non-zero entry in the upper triangular portion 
of its columns. Moreover, for a leaf node this upper triangular entry is the only non-zero 
off-diagonal entry in the whole column. See, for example, the data matrix associated with 
the tree graph depicted in Fig [2] 
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Ai3 








\ 











A24 
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^13 





A33 




















A44 












^25 








^5 


/ 



where the non-zero upper triangular entries are in bold and among these the entries corre- 
sponding to leaves are underlined. 

Now, according to GE we would like to lower triangulate the matrix A. This is done by 
eliminating these entries from the leaves to the root. Let / be a leaf node, i be its parent 
and j be its parent (Z'th node grandparent). Then, the /'th row is multiplied by —An/Au and 
added to the i'th row. in this way the An entry is being eliminated. However, this elimination, 
transforms the i'th diagonal entry to be An — > An — A\/ An, or for multiple leaves connected 
to the same parent An An - Y^iem)] ^h/Au- In our example. 
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/ 



Thus, in a similar manner, eliminating the parent i yields the multiplication of the j'th 
diagonal term by —Ajj/ {An — X]ieN{j)j ^h/^u)- Recalling that Pa = An, we see that the last 
expression is identical to the update rule of Pij in GaBP. Again, in our example 
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^55 





(37) 



where B = A^x- A\^I{A22- A\^l Aj.-^- A\J A^^- Al^l A^-^), C = A22-AIJA33-AIJAU- 
A25/A55. Now the matrix is fully lower triangulated. To put differently in terms of GaBP, 
the Pij messages are subtracted from the diagonal Pa terms to triangulate the data matrix of 
the tree. Performing the same row operations on the right hand side column vector b, it can 
be easily seen that we equivalently get that the outcome of the row operations is identical 
to the GaBP solver's update rule. These updadtes/row operations can be repeated, in the 
general case, until the matrix is lower triangulated. 

Now, in order to compute the value of the unknown variable associated with the root node, 
all we have to do is divide the first diagonal term by the transformed value of 61, which is 
identical to the infer stage in the GaBP solver (note that by definition all the nodes connected 
to the root are its children, as it does not have parent node). In the example 
. Au-AIJ{A22- Als /A33 - Al^ /A44 - Al^ /A55 ) 



(38) 



hu - ^12/(622 - ^13M33 - ^24M44 - ^25/^55) 

Note that the rows corresponding to leaves remain unchanged. 

To conclude, in the tree graph case, the 'iterative' stage (stage 2 on algorithm [T]) of the 
GaBP solver actually performs lower triangulation of the matrix, while the 'infer' stage (stage 
4) reducers to what is known as forward substitution. Evidently, using an opposite ordering, 
one can get the complementary upper triangulation and back substitution, respectively. 



It is important to note, that based on this proposition, the GaBP solver can be viewed as 
GE ran over an unwrapped version (i.e. , a computation tree) of a general loopy graph. 



B. Iterative Methods 

Iterative methods that can be expressed in the simple form 

x(*) = Bx(*-i) + c, 



(39) 



where neither the iteration matrix B nor the vector c depend upon the iteration number t, are 
called stationary iterative methods. In the following, we discuss three main stationary iterative 
methods: the Jacobi method, the Gauss-Seidel (GS) method and the successive overrelaxation 
(SOR) method. The GaBP-based solver, in the general case, can not be written in this form, 
thus can not be categorized as a stationary iterative method. 

Proposition 15: [23] Assuming I — B is invertible, then the iteration [39] converges (for 
any initial guess, x*^°)). 

C. Jacobi Method 

The Jacobi method (Gauss, 1823, and Jacobi 1845, [2]), a.k.a. the simultaneous iteration 
method, is the oldest iterative method for solving a square linear system of equations Ax = b. 
The method assumes that Vj An ^ 0. It's complexity is per iteration. A sufficient 

convergence condition for the Jacobi method is that for any starting vector xq as long as 
p{D-\L + U)) < 1. Where D = diag{A}, L, U are upper and lower triangular matrices 
of A. A second sufficient convergence condition is that A is diagonally dominant. 

Proposition 16: The GaBP-based solver (Algorithm [I]) 

1) with inverse variance messages arbitrarily set to zero, i.e. , Pij = 0, z G N(j), Vj; 

2) incorporating the message received from node j when computing the message to be 
sent from node i to node j, i.e. replacing k E N(i)\j with k G N(i), 

is identical to the Jacobi iterative method. 

Proof: [Proof] Arbitrarily setting the precisions to zero, we get in correspondence to 
the above derivation, 

Pi\j = Pu = Au, (40) 

-Pijl^ij A^jfl^x^jj (41) 

/ii = A^\bi- J2 (42) 

fceN(i) 

Note that the inverse relation between Pij and Pi\j (|24l) is no longer valid in this case. 
Now, we rewrite the mean (|22|) without excluding the information from node j, 

= Aj^{hi - ^ Aki^ik\d- (43) 

fceN(i) 

Note that = yUj, hence the inferred marginal mean jii can be rewritten as 

Hi = AT^\k - ^kiPk), (44) 



where the expression for all neighbors of node i is replaced by the redundant, yet identical, 
expression k ^ i. This fixed-point iteration is identical to the renowned Jacobi method. 



Proposition \T6\ can be viewed also as a probabilistic proof of Jacobi. The fact that Jacobi 
iterations can be obtained as a special case of the GaBP solver further indicates the richness 
of the proposed algorithm. Note, that the GaBP algorithm converges to the exact solution 
also for nonsymmetric matrices in this form. 

D. Gauss-Seidel 

The Gauss-Seidel method converges for any starting vector Xq if p{{L + D)^^U) < 1. 

This condition holds, for example, for diagonally dominant matrices as well as for positive 
definite ones. It is necessary, however, that the diagonal terms in the matrix are greater (in 
magnitude) than the other terms. 

The successive overrelaxation (SOR) method aims to further refine the Gauss-Seidel method, 
by adding a damping parameter < « < 1: 



where GSi is the Gauss-Seidel update computed by node i. Damping has previously shown 
to be a heuristic for accelerating belief propagation as well [24]. 

VII. Distributed Iterative Computation of Moore-Penrose Pseudoinverse 

In this section, we efficiently extend the applicability of the proposed GaBP-based solver 
for systems with symmetric matrices to systems with any square (i.e. , also nonsymmetric) or 
rectangular matrix. We first construct a new symmetric data matrix R based on an arbitrary 
(non-rectangular) matrix S G M'^^'^ 



Additionally, we define a new vector of variables x = {x^, z^}^ G M(^+")^i, where x G R^^^ 
is the (to be shown) solution vector and z G M"^Ms an auxiliary hidden vector, and a new 
observation vector y = {0^,y^}^ G 

Now, we would like to show that solving the symmetric linear system Rx = y, taking 
the first k entries of the corresponding solution vector x is equivalent to solving the original 
(not necessarily symmetric) system Rx = y. Note that in the new construction the matrix R 



concluding the proof. 



x\ = (xx\~^ + (1 - a)GS', 



(45) 




(46) 



is sparse again, and has only 2nk off-diagonal nonzero elements. When running the GaBP 
algorithm we have only 2nk messages, instead of n"^ in the previous construction. 
Writing explicitly the symmetric linear system's equations, we get 

X + S^z = 0, 



Thus, 



and extracting x we have 



Sx - \I/z = y. 



x = v|/S'^(y-Sx), 



x=(S^S + M/„)-iS^y 



Note, that when the noise level is zero, \E' = Ornxm, we get the Moore-Penrose pseudoinverse 
solution 

x = (S^S)-iS^y = Sty. 



VIII. Numerical Examples and Applications 

Our experimental study includes four numerical examples and two possible applications. 
In all examples, but the Poisson's equation [58l b is assumed to be an m-length all-ones 
observation vector. For fairness in comparison, the initial guess in all experiments, for the 
various solution methods under investigation, is taken to be the same and is arbitrarily set to 
be equal to the value of the vector b. The stopping criterion in all experiments determines 
that for all propagating messages (in the context the GaBP solver) or all n tentative solutions 
(in the context of the compared iterative methods) the absolute value of the difference should 
be less than e < 10^^. As for terminology, in the following performing GaBP with parallel 
(flooding or synchronous) message scheduling is termed 'parallel GaBP', while GaBP with 
serial (sequential or asynchronous) message scheduling is termed 'serial GaBP'. 



A. Numerical Example: Toy Linear System: 3x3 Equations 
Consider the following 3x3 linear system 
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(47) 



We would like to find the solution to this system, x 
matrix A, we directly solve 

( -1/12 -1/6 1/4 \ ^ 



X 

y* 



\ 



-1/6 2/3 1/2 
1/4 1/2 1/4 



{x*,y*,z*Y . Inverting the data 



2 



(48) 



X* A 1 b 

Alternatively, we can now run the GaBP solver. Fig. [3] displays the graph, corresponding 
to the data matrix A, and the message-passing flow. As Ay^ = A^y = 0, this graph is a cycle- 
free tree, thus GaBP is guaranteed to converge in a finite number of rounds. As demonstrated 
in the following, in this example GaBP converges only in two rounds, which equals the tree's 
diameter. Each propagating message, rriij, is described by two scalars jiij and Pij, standing 
for the mean and precision of this distribution. The evolution of the propagating means and 
precisions, until convergence, is described in Table I VIII- A [ where the notation t = 0, 1, 2, 3 
denotes the iteration rounds. Converged values are written in bold. 
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TABLE II 

Evolution of means and precisions on a tree with three nodes 



Next, following the GaBP solver algorithm, we infer the marginal means. For exposition 
purposes we also present in Table UlI] the tentative solutions at each iteration round. 

Thus, as expected, the GaBP solution x* = {x* = l,y* = 2,z* = —1}^ is identical to 
what is found taking the direct approach. Note that as the linear system is described by a 
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TABLE m 

Tentative means computed on each iteration until convergence 



tree graph, then for this particular case, the inferred precision is also exact 



p _|_ p _|_ p = _i2 



P + P 



xy 
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P + P 



3/2, 



4. 



and gives {p-' = {A-'},, = -1/12, P;' = {A-^},, = 2/3, p-' = {A-'}, 
i.e. the true diagonal values of the data matrix's inverse, A^^. 



(49) 
(50) 
(51) 
(52) 

1/4}^, 




Fig. 3. A tree topology with three nodes 



B. Application Example: Linear Detection 

Consider a discrete-time channel with a real input vector x = {xi, . . . ,xk}^ governed by 
an arbitrary prior distribution, Py^, and a corresponding real output vector y = {yi, . . . , Uk}^ = 
/{x^} G Here, the function /{■} denotes the channel transformation. By definition. 



^An extension to the complex domain is straightforward. 



/4 

linear detection compels the decision rule to be 

X = A{x*} = A{A-^b}, (53) 

where b = y is the K x 1 observation vector and the K x K matrix A is a positive-definite 
symmetric matrix approximating the channel transformation. The vector x* is the solution 
(over M) to Ax = b. Estimation is completed by adjusting the (inverse) matrix-vector product 
to the input alphabet, dictated by Py^, accomplished by using a proper clipping function A{-} 
{e.g. , for binary signaling A{-} is the sign function). 

For example, linear channels, which appear extensively in many applications in commu- 
nication and data storage systems, are characterized by the linear relation 

y = /{x} = Rx + n, (54) 

where n is a if x 1 additive noise vector and R = S^S is a positive-definite symmetric 
matrix, often known as the correlation matrix. The N x K matrix S describes the physical 
channel medium while the vector y corresponds to the output of a bank of filters matched 
to the physical channel S. 

Due to the vast applicability of linear channels, in Section IVIIII we focus in our experimental 
study on such channels, although our paradigm is not limited to this case. Assuming linear 
channels with AWGN with variance cr^ as the ambient noise, the general linear detection 
rule (1531) can describe known linear detectors. For example [25], [26]: 

• The conventional matched filter (MF) detector is obtained by taking A = and b = y. 
This detector is optimal, in the MAP-sense, for the case of zero cross-correlations, i.e. , 
R = lii", as happens for orthogonal CDMA or when there is no ISI effect. 

• The decorrelator (zero forcing equalizer) is achieved by substituting A = R and b = y. 
It is optimal in the noiseless case. 

• The linear minimum mean-square error (MMSE) detector can also be described by 
using A = R + (j'^Ik and b = y. This detector is known to be optimal when the input 
distribution Px is Gaussian. 

In general, linear detection is suboptimal because of its deterministic underlying mechanism 
(i.e. , solving a given set of linear equations), in contrast to other estimation schemes, such as 
MAP or maximum likelihood, that emerge from an optimization criterion. In the following 
section we implement the linear detection operation, in its general form (|53]) . in an efficient 
message-passing fashion. 



The essence of detection theory is to estimate a hidden input to a channel from empirically- 
observed outputs. An important class of practical sub-optimal detectors is based on linear 
detection. This class includes, for instance, the conventional single-user matched filter, the 
decorrelator (also, called the zero-forcing equalizer), the linear minimum mean-square error 
(MMSE) detector, and many other detectors with widespread applicability [25], [26]. In 
general terms, given a probabilistic estimation problem, linear detection solves a deterministic 
system of linear equations derived from the original problem, thereby providing a sub-optimal, 
but often useful, estimate of the unknown input. 

Applying the GaBP solver to linear detection, we establish a new and explicit link between 
BP and linear detection. This link strengthens the connection between message-passing infer- 
ence and estimation theory, previously seen in the context of optimal maximum a-posteriori 
(MAP) detection [27], [28] and several sub-optimal nonlinear detection techniques [29] 
applied in the context of both dense and sparse [30], [31] graphical models. 

In the following experimental study, we examine the implementation of a decorrelator 
detector in a noiseless synchronous CDMA system with binary signaling and spreading 
codes based upon Gold sequences of length 771 = 70 Two system setups are simulated, 
corresponding to n = 3 and n = 4 users, resulting in the cross-correlation matrices 
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respectively^ 

The decorrelator detector, a member of the family of linear detectors, solves a system of 
linear equations. Ax = b, where the matrix A is equal to the n x n correlation matrix R, 
and the observation vector b is identical to the n-length CDMA channel output vector y. 



''in this case, as long as the system is not overloaded, i.e. the number of active users n is not greater than the spreading 
code's length m, the decorrelator detector yields optimal detection decisions. 

^ These particular correlation settings were taken from the simulation setup of Yener etaJ. [15]. 



lb 

Thus the vector of decorrelator decisions is determined by taking the signum of the vector 
A^^b = R^^y. Note that R3 and R4 are not strictly diagonally dominant, but their spectral 
radii are less than unity, since pdig - R3I) = 0.9008 < 1 and p(|l4 - R4I) = 0.8747 < 1, 
respectively. In all of the experiments, we assumed the output vector was the all-ones vector. 

Table UV] compares the proposed GaBP algorithm with standard iterative solution meth- 
ods [2] (using random initial guesses), previously employed for CDMA multiuser detectors 
(MUD). Specifically, MUD algorithms based on the algorithms of Jacobi, Gauss-Seidel (GS) 
and (optimally weighted) successive over- relaxation (SOrJ] were investigated [13], [14]. The 
table lists the convergence rates for the two Gold code-based CDMA settings. Convergence 
is identified and declared when the differences in all the iterated values are less than 10^^. 
We see that, in comparison with the previously proposed detectors based upon the Jacobi and 
GS algorithms, the GaBP detectors converge more rapidly for both n = 3 and n = A. The 
serial (asynchronous) GaBP algorithm achieves the best overall convergence rate, surpassing 
even the SOR-based detector. 




Fig. 4. Convergence of the GaBP algorithm vs. Jacobi on a 3 x 3 gold CDMA matrix. Each dimension shows one 
coordinate of the solution. Jacobi converges in zigzags while GaBP has spiral convergence. 



Further speed-up of GaBP can be achieved by adapting known acceleration techniques from 
linear algebra, such Aitken's method and Steffensen's iterations [16]. Consider a sequence 
{xn} (e.g. , obtained by using GaBP iterations) linearly converging to the limit x, and a;„ 7^ x 
for n > 0. According to Aitken's method, if there exists a real number a such that |a| < 1 

^This moving average improvement of Jacobi and GS algorithms is equivalent to what is known in the BP literature as 
'damping' [24]. 




Fig. 5. Convergence of the two gold CDMA matrices. To the left R3, to the right, R4. 
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17 


14 
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TABLE IV 

Decorrelator for K = Z, 4-user, N = 7 Gold code CDMA. Total number of iterations required for 

CONVERGENCE (THRESHOLD e = 10~^) FOR GABP-BASED SOLVERS VS. STANDARD METHODS. 



and lim„_>oo(a;n — S:)/{xn-i — x) — a, then the sequence defined by 

Vn — 7: ~j~ 

converges to x faster than {xn} in the sense that lim^^oo |(^ — yn)/{x — = 0. Aitken's 
method can be viewed as a generalization of over-relaxation, since one uses values from 
three, rather than two, consecutive iteration rounds. This method can be easily implemented 
in GaBP as every node computes values based only on its own history. 
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TABLE V 

Decorrelatorfor K ^ 3, 4-user, N ^ 7 Gold code CDMA. Total number of iterations required for 

CONVERGENCE (THRESHOLD e = 10"'') FOR JACOBI, PARALLEL AND SERIAL GaBP SOLVERS ACCELERATED BY 

Steffensen iterations. 




Fig. 6. Convergence acceleration of tlie GaBP algorithm using Aitken and Steffensen methods. The left graph depicts a 
3x3 gold CDMA matrix, the right graph 4x4 gold CDMA matrix. Further details regarding simulation setup are found 
in Section rvnTBl 



Steffensen's iterations incorporate Aitken's method. Starting with Xn, two iterations are run 
to get Xn+i and Xn+2- Next, Aitken's method is used to compute y„, this value replaces the 
original x„, and GaBP is executed again to get a new value of Xn+i- This process is repeated 
iteratively until convergence. Table |V] demonstrates the speed-up of GaBP obtained by using 
these acceleration methods, in comparison with that achieved by the similarly modified Jacobi 



algorithm^ We remark that, although the convergence rate is improved with these enhanced 
algorithms, the region of convergence of the accelerated GaBP solver remains unchanged. 

For the algorithms we examined. Fig. l-(a) displays the Euclidean distance between the 
tentative (intermediate) results and the fixed-point solution as a function of the number of 
iterations. As expected, all linear algorithms exhibit a logarithmic convergence behaviour. 
Note that GaBP converges faster on average, although there are some fluctuations in the 
GaBP curves, in contrast to the monotonicity of the other curves. 

An interesting question concerns the origin of this convergence speed-up associated with 
GaBP. Better understanding may be gained by visualizing the iterations of the different 
methods for the matrix R3 case. The convergence contours are plotted in the space of 
{xi,X2,X3} in Fig. l-(b). As expected, the Jacobi algorithm converges in zigzags towards 
the fixed point (this behavior is well-explained in Bertsekas and Tsitsiklis [23]). The fastest 
algorithm is serial GaBP. It is interesting to note that GaBP convergence is in a spiral shape, 
hinting that despite the overall convergence improvement, performance improvement is not 
guaranteed in successive iteration rounds. The spiral nature of GaBP convergence is better 
viewed in Fig. l-(c). In this case the system was simulated with a specific R matrix for 
which Jacobi algorithm and other standard methods did not even converge. Using Aitken's 
method, a further speed-up in GaBP convergence was obtained. 

Despite the fact that the examples considered correspond to small multi-user systems, 
we believe that the results reflect the typical behavior of the algorithms, and that similar 
qualitative results would be observed in larger systems. In support of this belief, we note, 
in passing, that GaBP was experimentally shown to converge in a logarithmic number of 
iterations in the cases of very large matrices both dense (with up to hundreds of thousands 
of dimensions [33]) and sparse (with up to millions of dimensions [34], [35]). 

As a final remark on the linear detection example, we mention that, in the case of a channel 
with Gaussian input signals, for which linear detection is optimal, it can be easily shown 
that the proposed GaBP scheme reduces to the BP-based MUD scheme, recently introduced 
by Montanari et al. [36]. Their BP scheme, tailored specifically for Gaussian signaling, has 
been proven to converge to the MMSE (and optimal) solution for any arbitrarily loaded, 

^Application of Aitken and Steffensen's methods for speeding-up the convergence of standard (non-BP) iterative solution 
algorithms in the context of MUD was introduced by Leibig et al. [32]. 



randomly-spread CDMA system (i.e. , a system where — R) ^ 1). 1^ Thus Gaussian- 
input additive white Gaussian noise CDMA is another example for which the proposed GaBP 
solver converges to the MAP decisions for any mxn random spreading matrix S, regardless 
of the spectral radius. 

C. Numerical Example: Symmetric Non-PSD Indefinite) Data Matrix 

Consider the case of a linear system with a symmetric, but non-PSD data matrix 

1 2 3 ^ 

2 2 1 . (57) 
^3 1 1 j 

Table |VI] displays the number of iterations required for convergence for the iterative meth- 
ods under consideration. The classical methods diverge, even when aided with acceleration 
techniques. This behavior (at least without the acceleration) is not surprising in light of 
Theorem [131 Again we observe that serial scheduling of the GaBP solver is superior parallel 
scheduling and that applying Steffensen iterations reduces the number of iterations in 45% 
in both cases. Note that SOR cannot be defined when the matrix is not PSD. By definition 
CG works only for symmetric PSD matrices, because the solution is a saddle point and not 
a minimum or maximum. 

IX. Application Example: 2-D Poisson's Equation 

One of the most common partial differential equations (PDEs) encountered in various areas 
of exact sciences and engineering {e.g. , heat flow, electrostatics, gravity, fluid flow, quantum 
mechanics, elasticity) is Poisson's equation. In two dimensions, the equation is 

/^u{x,y) = f{x,y), (58) 

for {x, y} E ri, where 

92(-) 92(-) 

is the Laplacean operator and ^7 is a bounded domain in M?. The solution is well defined 
only under boundary conditions, i.e. , the value of y) on the boundary of Vt is specified. 
We consider the simple (Dirichlet) case of u{x, y) = for {x,y} on the boundary of Vl. This 



'For non-Gaussian signaling, e.g. with binary input alphabet, this BP-based detector is conjectured to converge only in 
the large-system limit, as n, m ^ oo [36]. 
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Iterations t 


Jacobi,GS.SR,Jacobi+Ailkcns,Jacobi+Slcfl'cnscn 


- 


Parallel GaBP 


38 


Serial GaBP 


25 


Parallel GaBP+Steffensen 


21 


Serial GaBP+Steffensen 
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TABLE VI 

Symmetric non-PSD 3x3 data matrix. Total number of iterations required for convergence 
(threshold € = 10~®) FOR GaBP-based solvers vs. standard methods. 




10"'^ 

2 4 6 8 10 

Iteration t 

Fig. 7. Convergence rate for a 3 x 3 symmetric non-PSD data matrix. The Frobenius norm of the residual per equation, 
I Ax' — bWr/n, as a function of the iteration t for OS (triangles and solid line), Jacobi (squares and solid line), SR (stars 
and solid line), parallel GaBP (circles and solid line) and serial GaBP (circles and dashed line) solvers. 

equation describes, for instance, the steady-state temperature of a uniform square plate with 
the boundaries held at temperature u = 0, and f{x, y) equaling the external heat supplied at 
point {x,y}. 

The poisson's PDE can be discretized by using finite differences. An p + 1 x p + 1 square 
grid on Q, with size (arbitrarily) set to unity is used, where h= l/(p + 1) is the grid spacing. 



S2. 




Fig. 8. The left graph depicts accelerated convergence rate for a 3 x 3 symmetric non-PSD data matrix. The Frobenius 
norm of the residual per equation, ||Ax' — 6||F/n, as a function of the iteration t for Aitkens (squares and solid line) 
and Steffensen-accelerated (triangles and soUd line) Jacobi method, parallel GaBP (circles and solid line) and serial GaBP 
(circles and dashed Une) solvers accelerated by Steffensen iterations. The right graph shows a visualization of parallel GaBP 
on the same problem, drawn in R^. 




10 20 30 40 50 60 70 80 90 100 
Index j 



Fig. 9. Image of the corresponding sparse data matrix for the 2-D discrete Poisson's PDE with p = 10. Empty (full) 
squares denote non-zero (zero) entries. 



We let U{i,j), {i,j — 0,...,p+ 1}, be the approximate solution to the PDE at x — ih and 



y — jh. Approximating the Laplacean by 



£/(z + l,j)-2^(z,j) + £/(i-l,j) ^ ^(z,j- + l)-2^(z,j) + ^(i,j-l) 



h? 



(60) 



one gets the system of n — hnear equations with n unknowns 

4C/(i, j) - U{i -l^j)-U{i + l, j) - U{i, J - 1) - U{i, J + 1) = b{i, J = 1, . . . , p, (61) 

where b{i,j) = —f(ih,jh)h'^, the scaled value of the function f{x,y) at the corresponding 
grid point Evidently, the accuracy of this approximation to the PDE increases with n. 




Fig. 10. Convergence rate for a 3 x 3 symmetric non-PSD data matrix. The Frobenius norm of the residual per equation, 
1 1 Ax* — bWp/n, as a function of the iteration t for GS (triangles and solid line), Jacobi (squares and solid line), SR (stars 
and solid line), parallel GaBP (circles and solid line) and serial GaBP (circles and dashed line) solvers. 



Choosing a certain ordering of the unknowns U {i, j), the linear system can be written in a 
matrix-vector form. For example, the natural row ordering (i.e. , enumerating the grid points 
left— >right, bottom— >up) leads to a linear system with x sparse data matrix A. For 
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Fig. 11. The left graph depicts accelerated convergence rate for a 3 x 3 symmetric non-PSD data matrix. The Frobenius 
norm of the residual per equation, ||Ax* — b\\F/n, as a function of the iteration t for Aitkens (squares and solid line) 
and Steffensen-accelerated (triangles and solid line) Jacobi method, parallel GaBP (circles and solid line) and serial GaBP 
(circles and dashed Une) solvers accelerated by Steffensen iterations. The right graph shows a visualization of parallel GaBP 
on the same problem, drawn in 'M? . 



example, a Poisson PDE with p — Z generates the following 9x9 linear system 



/ 4 -1 


-1 








' f/(l,l) ^ 




^6(1,1) ^ 


-1 4 -1 


-1 








[/(2,1) 
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-1 4 


-1 
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-1 


4 -1 


-1 






C/(l,2) 




6(1,2) 


-1 


-1 4 -1 


-1 
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-1 4 




-1 
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6(3,2) 




-1 


4 -1 






C/(l,3) 




6(1,3) 




-1 


-1 4 






[/(2,3) 




6(2,3) 


v 


-1 


-1 






t/(3,3) y 




1^6(3,3) , 




A 






X 


b 



where blank data matrix A entries denote zeros. 



Hence, now we can solve the discretized 2-D Poisson's PDE by utilizing the GaBP 
algorithm. Note that, in contrast to the other examples, in this case the GaBP solver is 
applied for solving a sparse, rather than dense, system of linear equations. 

In order to evaluate the performance of the GaBP solver, we choose to solve the 2-D 
Poisson's equation with discretization of p = 10. The structure of the corresponding 100 x 100 
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TABLE VII 

2-D DISCRETE POISSON'S PDE WITH p = 3 AND f{x,y) = — 1. TOTAL NUMBER OF ITERATIONS REQUIRED EOR 
CONVERGENCE (THRESHOLD e = 10~") FOR GaBP-BASED SOLVERS VS. STANDARD METHODS. 



sparse data matrix is illustrated in Fig. [9l 

X. Conclusion and Future Directions 

In this paper, we have established a powerful new connection between the problem of 
solving a system of linear equations and probabilistic inference on a suitable Gaussian 
graphical model. Exploiting this connection, we have developed an iterative, message-passing 
algorithm based upon Gaussian BP that can be used as a low-complexity alternative to linear 
algebraic solutions based upon direct matrix inversion. By its nature, the new algorithm allows 
an efficient, distributed implementation. 

There are numerous applications in mathematics and engineering to which the new GaBP- 
based algorithm is applicable. We illustrated its potential performance advantages in the 
context of linear detection [7], [12], but many other techniques in digital communications 
requiring matrix inversion or determinant computation, such as channel precoding [37], are 
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Fig. 12. Accelerated convergence rate for the 2-D discrete Poisson's PDE with p = 10 and f{x, y) = —1. The Frobenius 
norm of the residual, per equation, ||Ax* — b\\F/n, as a function of the iteration t for parallel GaBP solver accehated 
by Aitkens method (o-marks and solid Une) and serial GaBP solver accelerated by Steffensen iterations (left triangles and 
solid line) 
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Fig. 13. Convergence of an asymmetric 3x3 matrix. 

amenable to the new approach. 

The extension of this technique to systems of linear equations over finite fields would open 
up a wealth of other applications. For example, such a development could lead to an iterative, 
message-passing algorithm for efficient decoding of algebraic error-correcting codes, like the 
widely-used class of BCH and Reed-Solomon codes. 
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Parallel GaBP 
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TABLE VIII 

Asymmetric 3x3 data matrix, total number of iterations required for convergence (threshold 
t = 10"'') FOR GaBP-based solvers vs. standard methods. 




Fig. 14. Convergence of a 3 x 3 asymmetric matrix, using 3D plot. 



Appendix I 
Proof of Lemma [TQ] 

Proof: Taking the product of the two Gaussian probability density functions 



h{x)h{x) = exp ( - (Pi(x - + P^ix - /i2)^)/2) (63) 

and completing the square, one gets 

fi{x)f2{x) = ^ exp ( - P(x - /i)V2), (64) 



with 



P = P1 + P2, (65) 



^ P-\filP,+fl2P2) (66) 

and the scalar constant determined by 



C ^ s^p-p-^ exp {[P,iil{p-^P, - 1) + P2l^l{P-'P2 - 1) + 2P-iPiP2/xi/i2)/2). (67) 
Hence, the product of the two Gaussian densities is C ■ A/'(/i, P^^). ■ 



Appendix II 
Integrating over x,- 



Proof: 



mij{xj) oc / ^pij{xi,Xj)(f)i{xi) Y\ rnki{xi)dxi (68) 



fc6N(i)\j 

f / 

OC 



„2 



exp {-XiAi^xf) exp (x^ /2 - ^i\jXi)) dxi (69) 

exp ((-Pi\jX,V2) + {PiXjli-iXi - AijXj)xi)dxi (70) 
cx exp ( iPi\jfii\j - AijXj ) ^ / (2Pi\j ) ) (71) 
oc = -P^'A^fj.^,, Pr^^ = -ATfP^), (72) 

where the exponent (T/TI) is obtained by using the Gaussian integral (|23l) . ■ 

Appendix III 
Maximizing over Xi 

Proof: 

mij{xj) oc aj:gmay.il)ij{xi,Xj)(t)i{xi) J]^ mki{xi) (73) 

fceN(i)\j 

(xi ) 't'ii^^i) nfe6N{«) \j "Ifei (a;* ) 
' " ^' 2 ^ 

oc arg max exp {—XiAijXj) exp {—Pi\j (a^j /2 — fii\jXi)) (74) 
= argmaxexp {{-Pi\jX^j2) + {Pi\jiJ,i\j - AijXj)xi). (75) 



Hence, x™^", the value of Xi maximizing the product 4'ij{xi,Xj)(j)i{xi)Y[k^^(^i-)\jfnki{xi) is 
given by equating its derivative w.r.t. Xj to zero, yielding 



max _ Pi\jf^i\j ^ij^j 



^max ^ ^XJr.,j ^^^^ 

Substituting xf^ back into the product, we get 

mij{xj) oc exp{{Pi\jiJ,i\j - AijXj f/{2Pi\j)) (77) 

oc ^^{^^,, = -P^'A^f^i:^, P^r.^ = -Ar^Pi^). (78) 

which is identical to the result obtained when eliminating xi via integration (1721) . ■ 



Appendix IV 

Quadratic Min-Sum Message Passing algorithm 

The quadratic Min-Sum message passing algorithm was initially presented in [6]. It is a 
variant of the max-product algorithm, with underlying Gaussian distributions. The quadratic 
Min-Sum algorithm is an iterative algorithm for solving a quadratic cost function. Not 
surprisingly, as we have shown in Section IIII-DI that the Max-Product and the Sum-Product 
algorithms are identical when the underlying distributions are Gaussians. In this contribution, 
we show that the quadratic Min-Sum algorithm is identical to the GaBP algorithm, although 
it was was derived differently. 

In [6] the authors discuss the application for solving linear system of equations using the 
Min-Sum algorithm. Our work [7] was done in parallel to their work, where both papers 
appeared in the 45th AUerton conference. 

Theorem 17: The Quadratic Min-Sum algorithm is an instance of the GaBP algorithm. 
Proof: [Proof] We start in the quadratic parameter updates: 



lij = Z ^=^2 ~ ( ■'- ~^ueN{i)\j Tni 'Jui Tm ) 

i - l.ueN{i)\ji- uilui 

Which is equivalent to |2ll Regarding the mean parameters, 



Zij = Z ' ^=^2 ~ '^u£N{i)\jZui) = ^ij lij ( hi — I]ug7v(i)\j ^^uj) 

-L - l^u£N(i)\ji- uilui 

Which is equivalent to 
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TABLE IX 
Notations of Min-Sum [6] vs. GaBP 



Min-Sum [6] 


GaBP [7] 


comments 


h ■ 

Xi 


Hi\j 

Oi 

1 

Xi 

Pi 

A, J 


quadratic parameters / product rule precision from i to j 
linear parameters / product rule mean rom i to j 

prior mean of node i 
prior precision of node i 
posterior mean of node i 
posterior precision of node i 

covariancc of nodes / and j 



For simplicity of notations, we list the different notations in Min-Sum paper vs. our 
notations: As shown above, the Min-Sum algorithm assumes the covariance matrix F is 
first normalized s.t. the main diagonal entries (the variances) are all one. The messages 
sent in the Min-Sum algorithm are called linear parameters (which are equivalent to the 
mean messages in GaBP) and quadratic parameters (which are equivalent to variances). The 
difference between the algorithm is that in the GaBP algorithm, a node computes the product 
rule and the integral, and sends the result to its neighbor. In the Min-Sum algorithm, a node 
computes the product rule, sends the intermediate result, and the receiving node computes 
the integral. In other words, the same computation is performed but on different locations. 
In the Min-Sum algorithm terminology, the messages are linear and quadratic parameters vs. 
Gaussians in our terminology. 
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